\documentclass[journal]{IEEEtran}

%------------------------------------------------------------------------------
\usepackage{cite}
\usepackage{amsmath}

% \newtheorem{definition}{Definition}
% \usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{url}
\usepackage{cite}
\usepackage{balance}
% \usepackage{float}
% \usepackage{threeparttable}
% \usepackage{multirow}
% \usepackage{epstopdf}
% \usepackage{mathptmx}
% \usepackage[scaled=.90]{helvet}
% \usepackage{courier}
\usepackage{listings}
\lstset{
   language=C,
   basicstyle=\small,
   keywordstyle=\bfseries,
   identifierstyle=\ttfamily,
   stringstyle=\ttfamily,
   numbers=left,
   numberstyle=\tiny,
   stepnumber=1,
   numbersep=-5pt,
   showstringspaces=false
%   frame=single %trbl%
}

% \usepackage[normalem]{ulem}

\usepackage{color}

%------------------------------------------------------------------------------

\newcommand{\etal}{\emph{et al.}}
\newcommand{\eg}{\emph{e.g.}}
\newcommand{\ie}{\emph{i.e.}}
\newcommand{\etc}{\emph{etc.}}
\newcommand{\cf}{\emph{cf.}}

%------------------------------------------------------------------------------

\begin{document}


%------------------------------------------------------------------------------

\title{Runtime Tunable Transmitting Power Technique in mm-Wave WiNoC Architectures}

\author{Andrea~Mineo~\IEEEmembership{Student Member,~IEEE,}
  Maurizio~Palesi,~\IEEEmembership{Member,~IEEE,}
  Giuseppe~Ascia,~\IEEEmembership{Member,~IEEE,}
  and~Vincenzo~Catania\thanks{A.~Mineo, G.~Ascia and V.~Catania are
    with the Dipartimento di Ingegneria Elettrica, Elettronica e
    Informatica, University of Catania, Catania, Italy (email:
    \{gascia,vcatania\}@dieei.unict.it). M.~Palesi is with Kore
    University, Enna (email: maurizio.palesi@unikore.it).}}

\maketitle

\markboth{IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. xx, NO. xx, FEBRUARY 2015}%
{Mineo \MakeLowercase{\textit{et al.}}: Runtime Tunable Transmitting Power Technique in mm-Wave WiNoC Architectures}

%---------------------------------------------------------------------

\begin{abstract}
  Emerging on-chip communication technologies, like wireless
  Networks-on-Chip (WiNoCs), have been recently proposed as candidate
  solutions for addressing the scalability limitations of conventional
  multi-hop NoC architectures. In a WiNoC, a subset of network nodes,
  namely, radio hubs, are equipped with a wireless interface which
  allows them to wirelessly communicate with other radio hubs. Thus,
  long-range communications, which would involve multiple hops in a
  conventional wireline NoC, can be realized by a single hop through
  the radio medium. Unfortunately, the energy consumed by the RF
  transceiver into the radio hub (\ie, the main building block in a
  WiNoC), and in particular by its transmitter, accounts for a
  significant fraction of the overall communication energy. In order
  to alleviate such contribution, this paper presents a runtime
  tunable transmitting power technique for improving the energy
  efficiency of the transceiver in WiNoC architectures. The basic idea
  is tuning the transmitting power based on the physical location of
  the recipient of the current communication. Specifically, based on
  the destination address of the incoming packet, the radio hub tunes
  its transmitting power to a minimum level, but high enough to reach
  the destination antenna without exceeding a certain bit error
  rate. The proposed technique is general and can be applied to any
  WiNoC architecture. Its application on different representative
  WiNoC architectures results in an average energy reduction up to
  50\% without any impact on performance and with a negligible
  overhead in terms of silicon area.
%% In the last few years, many commercial multiprocessors System-on-Chip
%% (MPSoCs) which use a Network-on-Chip (NoC) as interconnection backbone
%% have been released from leading chip vendors. In modern CMOS
%% technologies, the integration density continues to increase while
%% limitations due to the wiring interconnect become a bottleneck
%% especially in multi-hop intra-chip communications. Emerging
%% architectures, such as Wireless NoC (WiNoC), represent the candidate
%% solutions to deal with the communication latency issues that
%% characterize such many-core architectures. In WiNoC, metallic wires
%% are replaced with long-range radio interconnections. Unfortunately,
%% the energy consumed by the RF transceiver (\ie, the main building
%% block in a WiNoC), and in particular by its transmitter, accounts for
%% a significant fraction of the overall communication energy.  In order
%% to alleviate such contribution, this paper proposes a runtime tunable
%% transmitting power technique for improving the energy efficiency of
%% the transceiver in WiNoC architectures. The basic idea is tuning the
%% transmitting power based on the physical location of the recipient of
%% the current communication. The proposed technique is general and can
%% be applied to any WiNoC architecture. Its application on different
%% representative WiNoC architectures resulted in an average energy
%% reduction up to 50\% without any impact on performance and with a
%% negligible overhead in terms of silicon area.
\end{abstract}

%------------------------------------------------------------------------------

\begin{IEEEkeywords}
Network on Chip, wireless on-chip communication, energy reduction.
\end{IEEEkeywords}

%------------------------------------------------------------------------------

\section{Introduction}
%% Multiprocessors System-on-Chip (MPSoCs) which use a Network-on-Chip
%% (NoC) as interconnection fabric, are now a commercial reality. For
%% instance, TILERA has released a 72-core processor~\cite{tilera_72},
%% while Intel has leveraged its research results with Teraflop and
%% SCC~\cite{vangal_jssc08, intel_scc}, releasing a series of coprocessors
%% named Xeon Phi~\cite{intel_xeonphi} with 61 cores which use a ring
%% based NoC as on-chip communication backbone. Adapteva has launched in
%% the market a multicore parallel computing fabric which consists of a
%% 2D array of compute nodes connected by a low-latency mesh
%% NoC~\cite{adapteva64}.

%% As the number of cores integrated into the same chip increases, the
%% role played by the on-chip communication system becomes more and more
%% important. The cost (\ie, silicon area), the performance (\eg,
%% communication delay, throughput, \etc), and the energy consumption of
%% the NoC are common design optimization metrics.

%% For instance, with regard to the communication performance,
%% as the network size increases, due to the multi-hop communication
%% nature of Network-on-Chip (NoC) based systems, the communication
%% latency increases. Another issue regards the metal global wires used
%% in traditional CMOS technology.  In particular, while the integration
%% density and the devices speed increas, on the other hand the
%% electrical interconnections become the bottleneck in terms of delay
%% and power consumption~\cite{ho_poieee01}. To face with this problem,
%% three-dimensional integration, nanophotonic communication, and
%% RF/wireless interconnects are emerging as technological alternatives
%% to the metal/dielectric system. In particular, RF/wireless
%% interconnects can be divided in two main families called
%% RF-I~\cite{chang_hpca08} and Wireless NoC (WiNoC)~\cite{zhao_tc08}.

\IEEEPARstart{A}{s} the number of cores integrated into the same chip
increases, the role played by the on-chip communication system becomes
more and more important. In fact, in the manycore era, the
communication needs rise dramatically, to the point of turning
communication into the major bottleneck in terms of cost, performance,
and energy consumption. Current manycore architectures use a
Network-on-Chip (NoC) as communication backbone. Unfortunately, as the
network size increases, due to the multi-hop communication nature of
NoC-based systems, the communication latency increases. Further, as
technology shrink in the ultra-deep sub-micron (UDSM) regime,
electrical interconnections become the real bottleneck in terms of
delay and power consumption~\cite{ho_poieee01}.

To face with these problems, three-dimensional integration,
nanophotonic communication, and RF/wireless interconnects are emerging
as technological alternatives to the metal/dielectric system. This
paper focuses on RF/wireless interconnects. RF/wireless interconnects
can be divided in two main families, namely, RF-I~\cite{chang_hpca08}
and Wireless NoC (WiNoC)~\cite{zhao_tc08}. The first one is based on
the propagation, at the speed of light (effective), of an
electromagnetic wave through a waveguide formed by two close
conductors using standard CMOS technology. The waveguide acts as an
highway for the travelling information. Although RF-I solution has
demonstrated its effectiveness in terms of latency and low power
dissipation~\cite{chang_micro08}, its performance does not scale as
the number of communicating cores increases.

Scalability issues are solved by WiNoC
architectures. WiNoC~\cite{zhao_tc08} use a wireless backbone upon the
traditional wire-based NoC~\cite{deb_jetcas12}. A WiNoC introduces new
hardware structures such as antenna and transceivers, that represent
an overhead in terms of area and power. However, the use of
concentrated architectures and hierarchical topologies has been proved
as an effective solution to deal with the antenna area overhead
issue~\cite{ditommaso_hoti11}. With regard to the power issue, the
major contribution is due to the radio transmitter front-end connected
to the antenna. For instance, in~\cite{yu_mwscas11} the transmitter is
responsible for about 65\% of the overall transceiver power
consumption, while in~\cite{daly_jssc07} such contribution is more
than 74\%. Previous work in the context of WiNoCs are based on an
architecture of the transmitter in which the transmitting power is
kept constant (regardless the distance of the destination node), and
able to guarantee a given reliability level (in terms of bit error
ratio, BER\footnote{In digital transmission, the number of bit errors
  is the number of received bits of a data stream over a communication
  channel that have been altered due to noise, interference,
  distortion or bit synchronization errors. The bit error ratio (BER)
  is the number of bit errors divided by the total number of
  transferred bits during a studied time interval.}) in the worst
case.

In this paper we propose a novel mechanism for improving the energy
efficiency of the transmitters in WiNoC architectures. The basic idea
is allowing the transmitter to run-time set its transmitting power
based on the reliability requirements and the destination node of the
current communication. We provide a systematic approach that, under a
reliability constraint (given in terms of maximum BER) and for each
antenna, allows to determine the optimal transmitting power for each
destination node. The optimal transmitting power is off-line computed
by using an accurate 3D field solver for a limited number of
measurements. The obtained power figures are then used for configuring
the proposed variable gain controller which is responsible for driving
the power amplifier connected to the transmitting antenna. The
proposed technique is general and is agnostic with respect to the
underling WiNoC architecture. The proposed mechanism, introduced
in~\cite{mineo_date14}, is here presented in more detail and applied
to different WiNoC architectures, namely,
iWise64~\cite{ditommaso_hoti11}, McWiNoC~\cite{zhao_nocs11}, and
HmWNoC~\cite{deb_tc13}.  Results show the effectiveness of the
proposed technique in improving the energy efficiency (with energy
savings up to 50\%) without any impact on performance and with a
negligible overhead in terms of silicon area. In addition, we show
that, by exploiting the new degree of freedom provided by the
application of the proposed mechanism during the mapping process, it is
possible to further improve the energy metrics.

It should be pointed out that, although the dynamic tuning of the
transmitting power is a well known technique in the context of general
wireless networks, for the best of our knowledge, this is the first
work in which it is applied in the on-chip context. 

%%% <inizio> Commento 2 del Rev 2: eliminare tutta questa parte
%% In this paper we
%% show the feasibility of using a runtime tunable transmitting power
%% technique in the wireless interfaces of WiNoC architectures. We show
%% the achievable advantages in terms of energy saving with a negligible
%% impact on area figures and without affecting the overall communication
%% performance and reliability metrics.

%% The rest of the paper is organized as
%% follows. Section~\ref{sec:related} provides preliminaries on on-chip
%% wireless communications and reviews related work. The proposed
%% adaptive transmitting power transceiver is described in
%% Section~\ref{sec:proposed}. Section~\ref{sec:experiments} presents the
%% results of the experiments. Finally, Section~\ref{sec:conclusions}
%% concludes the paper.
%%%%%%%%%%% <fine>


%% We found
%% that, by integrating the proposed technique into two known Mesh
%% Topology-based WiNoC architectures, namely,
%% iWise64~\cite{ditommaso_hoti11}, McWiNoC~\cite{zhao_nocs11}, and
%% HmWNoC~\cite{deb_tc13} results in an energy reduction of 48\%, 50\%,
%% and 20\%, respectively.

%% Finally
%% the proposed architecture has also been proven for a specific
%% millimeter-wave small-world wireless NoC (HmWNoC) architecture
%% obtaining an average saving of 19\%.

%------------------------------------------------------------------------------

\section{Preliminaries and Related Work}
\label{sec:related}
\begin{table}
  \centering
  \caption{ITRS projections for the transition frequency $f_t$ and maximum 
    oscillating frequency $f_{max}$\cite{itrs_rfams12}.}
  \label{tab:itrs}
  \begin{tabular}{lcccccc}
    \hline
    Year        & 2012  & 2013 & 2014 & 2015 & 2016 & 2017  \\
    \hline
    $ f_t $(GHz)    & 315   & 315  & 345  & 360  & 375  & 390   \\
    $ f_{max}$(GHz)   & 420   & 455  & 490  & 525  & 560  & 595   \\
    \hline
  \end{tabular}
\end{table}
On-chip radio communication is a novel technique born initially for
distributing clock signals into the chip for reducing clock skew
related problems~\cite{floyd_jssc02}. The main drawback until then,
was the capability of integrating an antenna in a standard silicon
substrate compatibly with CMOS technology.  This is linked by the
capability for transistors of operating at high
frequencies. Tab.~\ref{tab:itrs} shows the trend for the cut-off and
oscillating frequency for MOS transistors as foreseen by the
International Technology Roadmap for Semiconductors
(ITRS)~\cite{itrs_rfams12}. The meaning of such projection is that,
over the time, the active devices can operate at higher and higher
frequencies.  Since, the dimension of an antenna has to be comparable
with the wavelength, the first consequence of an higher operating
frequency is that the dimension of an antenna will decrease. For
instance, the dimension of a dipole antenna (simply formed by two
conductors) operating at 60~GHz would have a length of $\mathrm{632
  \times 2 \ \mu m}$ when integrated in a silicon
substrate~\cite{gutierez_jsac09}; while, if operating in 5.8~GHz, the
dimension increases to $\mathrm{6.5 \times 2 \ mm}$, which is
comparable with the entire die size. Furthermore, the scaling is not
only limited to the antenna but it also affects the passive elements
inside the main building blocks of the RF front-end which are
responsible for a relevant fraction of its silicon area.

\begin{figure}
  \centering
  \includegraphics[width=0.30\textwidth ,angle=270]{pictures/lna.eps}
  \caption{Common Source RF amplifier.}
  \label{fig:rf_amplifier}
\end{figure}
To have a quantitative idea of the effects of the scaling on some of
the passive elements which form the RF front-end, let us consider the
RF-amplifier shown in Fig.~\ref{fig:rf_amplifier}. One of the design
steps is the sizing of the $CL$ group in order to resonate at the
center of the operating band. To do this, it needs to set the
admittance to zero at the target frequency (center of band), that is:
\begin{equation}
   Y_C + Y_L = j \omega_c C_p + \frac{1}{j\omega_c L}=j\bigg(\omega_c
C_p - \frac{1}{\omega_c L}\bigg) = 0, 
  \label{eq:ammettance}
\end{equation}
where $C_p$ is the parasitics capacitance which depends on the active
devices and on the other effects such as the contact capacitance and
the parasitics capacitance introduced by the inductor itself, and
$\omega_C$ is the operating frequency. Thus, solving
Eqn.~(\ref{eq:ammettance}) by $L$, we have:
\begin{equation}
  L = \frac{1}{\omega^2_c C}.
  \label{eq:set_ammettance}
\end{equation} 
From Eqn.~(\ref{eq:set_ammettance}) can be observed that, as the
frequency increases, the value of the inductance can be decreased.
For instance, as reported in~\cite{chang_hpca08}, at 20~GHz the size
of the inductor is approximately $\mathrm{50\mu m \times 50 \mu m}$
while at 400~GHz it can be reduced to $\mathrm{12\ \mu m\times 12
  \ \mu m}$. 

Based on the above considerations, several research groups have proven
the possibility of integrating every building block of the RF front
end (including the antenna) into the same
chip~\cite{floyd_jssc02,o_ted05,lin_jssc07}. In the context of on-chip
communication, the capability of integrating an antenna with its
transceiver into a silicon die~\cite{lin_jssc07} has lead several
research groups on assessing the advantages of having long range
wireless links upon the traditional wire-based NoC. An exhaustive
panoramic of the state-of-the-art in WiNoC architectures can be found
in~\cite{deb_jetcas12}. Here, the authors divide the various WiNoC
architectures into two main classes, namely, mesh-based topology and
small-world-based topology WiNoCs. Another classification can be made
on the basis of the portion of electromagnetic spectrum used for data
transmission such as UWB~\cite{zhao_tc08} (few GHz), mm-
wave~\cite{deb_asap10,ditommaso_hoti11,deb_isqed12,deb_glsvlsi12}
(tens of GHz), sub-THz~\cite{lee_mobicom_09} (hundreds of GHz), and
THz~\cite{ganguly_tc10,abadal_ton14} NoC. While the first three use
the metallization present in standard CMOS technology as antenna, the
latter make use of carbon nanotubes.  More recent results can be found
in~\cite{take_tc14} and in~\cite{abadal_ieeecm13}. In particular, the
former proposes a wireless 3D NoC architecture which uses inductive
coupling for inter-layer communication, while the latter introduces
antennas based on graphene. Graphene-based antenna assures working
frequency in the Terahertz band while utilizing less chip area for
antennas as compared to the metallic counterparts.

\begin{figure}
  \centering
  \includegraphics[width=0.4\textwidth]{pictures/zig_zag.eps}
  \caption{The zigzag antenna.}
  \label{fig:zigzag}
\end{figure}
In mm-wave WiNoCs, zigzag antenna (Fig.~\ref{fig:zigzag}) is
considered as the best candidate solution for on-chip
antenna~\cite{deb_jetcas12}.  A zigzag antenna for the mm-wave, can be
designed and characterized with yet consolidated techniques and
knowledge such as the use of field solvers. Furthermore, the use of
regular topologies, like 2D meshes, allows the exploitation of
symmetries that simplify their characterization. Several examples of
mesh-based WiNoC architectures can be found in
literature~\cite{lee_mobicom_09, ditommaso_hoti11, zhao_nocs11,
  wang_pdp11}. In this context, the most used modulation technique is
the Amplitude Shift Keying or On Off Keying
(ASK-OOK)~\cite{deb_asap10,ditommaso_hoti11,deb_jetcas12}. Although,
for a given bit error rate (BER), the ASK-OOK modulation requires a
higher transmitting power than that required by other modulation
techniques (\eg, the Quadrature Amplitude Modulation
(QAM)~\cite{couch2007digital}), and has a poor spectral efficiency,
its hardware implementation is simple (low area overhead as compared
to QAM) and tailored to be applied in the on-chip context. In this
paper, we propose a technique and circuitry for improving the power
efficiency of a ASK-OOK transceiver by means of a reliability aware
on-line transmitting power modulation. The basic idea presented in
this work has been already introduced in~\cite{mineo_date14}. Here, we
present the proposed technique in more detail by an exhaustive
assessment in which the proposed technique is applied on different
wireless NoC architectures and under different traffic
scenarios. Further, the scalability of the proposed technique when the
network size increases, is studied. In addition, we show how, by
coupling the proposed technique with a mapping technique, it is
possible to significantly improve the energy saving. Finally,
implementation details and results in terms of area, timing and power
are reported.

%------------------------------------------------------------------------------

\section{Adaptive Transmitting Power Transceiver}
\label{sec:proposed}
This section presents the proposed adaptive transmitting power
transceiver which adaptively determines the optimal transmitting
power, based on the packed destination address, under reliability
constraints expressed in terms of maximum allowed communication bit
error rate (BER).

%------------------------------------------------------------------------------

\subsection{Variable Gain Amplifier Controller}
Traditional transceivers in WiNoC architectures use the same
transmitting power regardless of the distance (location) of the
destination node. In fact, the transmitting power is set for the worst
case under a reliability (\ie, maximum BER) constraint. We propose to
runtime select the minimum transmitting power based on the physical
location of the destination node of the current communication. Of
course, the selected minimum transmitting power must be high enough to
meet the communication reliability constraints in terms of BER.

\begin{figure}
  \centering
  \includegraphics[width=0.50\textwidth]{pictures/blocks.eps}
  \caption{Scheme of the proposed adaptive transmitting power
    transceiver \color{red}($G_a$ represents the attenuation
    introduced by the wireless medium).}
  \label{fig:tx_scheme}
\end{figure}
The general scheme of the proposed adaptive transmitting power
transceiver is shown in Fig.~\ref{fig:tx_scheme}. As compared to a
traditional transceiver, it makes use of a tunable power amplifier
(PA) controlled by a variable gain amplifier (VGA) controller. In the
rest of the paper, we consider the architecture of the PA presented
in~\cite{daly_jssc07} which allows several transmitting power
steps. Although dynamically tuning the transmitting power is an
established technique in the context of radio communications (\eg,
mobile phones, wireless sensors network, \etc), its implementation
requires sophisticated controller policies hardly replicable in the
WiNoC domain. Thus, the proposed VGA controller uses the destination
address of the packet for accessing a look-up table containing the
configuration words used for configuring the PA. For a given
destination, the associated configuration word enables the PA to use
the minimum transmitting power to reach that destination by ensuring a
specific reliability level in terms of BER. Such optimal transmitting
power is computed offline as it will be discussed in the next
subsection.

%------------------------------------------------------------------------------

% \subsection{Friis Transmission Equation}
% \label{ssec:friis}
% \subsection{Signal Strength Requirements}
\subsection{Determining the Minimal Transmitting Power under a BER Constraint}
\begin{figure}
  \centering
  \includegraphics[width=0.45\textwidth]{pictures/friis.eps}
  \caption{Friis transmission equation: geometrical orientation of
    transmitting and receiving antennas. As indicated, considering a
    spherical coordinate system, $\varphi$ is the azimuthal angle in the
    XY plane, where the X axis is $0^\circ$ and Y axis is
    $90^\circ$. $\theta$ is the elevation angle where the Z-axis is
    $0^\circ$, and the XY plane is $90^\circ$.}
  \label{fig:friis}
\end{figure}
The required transmitting power depends on many factors, including,
the type of modulation, the transceiver noise figure, and the
attenuation introduced by the wireless medium. Let us consider
Fig.~\ref{fig:friis} which shows a transmitting antenna with an output
power $P_t$ and a relative angle respect the receiving antenna 
$(\theta_t,\varphi_t)$, and a receiving antenna, located at distance $R$,
with a relative angle respect the transmitting antenna 
$(\theta_r,\varphi_r)$. The fraction of the transmitting power that
reaches the terminal of the receiving antenna, $P_r$, can be computed
by the Friis transmission equation, Eqn.~(\ref{eq:friis_complex}), valid when
$R>2D^2/\lambda$, where $D$ is the maximum dimension of antenna (axial
length in our case) and $\lambda$ is the wavelength. 
\begin{equation}
  \begin{split}
  G_a &= \frac{P_r}{P_t} = \\ 
      &= e_t e_r \frac{\lambda^2 D_t(\theta_t,\varphi_t)D_r(\theta_r,\varphi_r)}{(4\pi R)^2}\cdot(1-|\Gamma_t|)(1-|\Gamma_r|)|\hat{\rho_t}\cdot \hat{\rho_r}|
  \end{split}
  \label{eq:friis_complex}
\end{equation} 
where:
\begin{itemize}
  \item $e_t$ and $e_r$ are the efficiencies of the transmitting and
    receiving antenna, respectively. These parameters mainly represent
    the signal losses in the silicon substrate. 
    %% For reducing such
    %% contribution, high resistivity Silicon on Insulator (SoI)
    %% substrates ($>1~\mathrm{K \Omega cm}$) can be
    %% used~\cite{montusclat_ecwt05} or a polyamide stratus (few micron
    %% thick) can be inserted under the antenna~\cite{lee_mobicom_09}.
  
  \item $D_t$ and $D_r$ are the directivities of the transmitting and
    receiving antenna, respectively. They quantify how much better the
    antenna can transmit to or receive from a specific direction.
  
  \item $\lambda$ is the effective wavelength. For an IC substrate, it
    is estimated by using the material properties of the top IC layers
    (silicon dioxide $\epsilon_r=3.9$)~\cite{gutierez_jsac09}.

  \item $|\Gamma|$ is the reflection coefficient which quantifies the
    portion of the transmitting/receiving power that is reflected by
    an impedance discontinuity in the transmission medium (ideally
    $|\Gamma|=0$).

  \item $|\hat{\rho_t} \cdot \hat{\rho_r}|$ takes into account the
    polarization status of the emitted EM wave (ideally, it is equal
    to one).
\end{itemize}

Eqn.~(\ref{eq:friis_complex}) highlights the parameters which
determine the gain $G_a$. It represents a first order model of the
wireless channel which is valid for free-space
communications. Although second order effects, including, wave
reflections due to metal structures, and multi-path effects, are not
modelled by Eqn.~(\ref{eq:friis_complex}), Friis equation is a good
starting point for understanding which parameters affect the
attenuation. \cite{tap_07} presents a detailed study on the the
propagation of radio waves in an on-chip context and confirms the
effect of the directivity and the distance in the Friis formula.
Since, as discussed above, the attenuation cannot be easily estimated
by means of mathematical models, in this work the computation of $G_a$
is carried out by means of Eqn.~(\ref{eq:friis_measured}).
\begin{equation}
  G_a=\frac{P_r}{P_t}=\frac{|S_{12}|}{(1-|S_{11}|)(1-|S_{22}|)},
  \label{eq:friis_measured}
\end{equation}
where, $S_{11}$, $S_{12}$, and $S_{22}$ are the scattering
parameters. Such parameters are not predicted but obtained by using
accurate field solver simulation tools~\cite{floyd_jssc02} or direct
measurements from realized prototypes by means of a network analyzer.

Using Eqn.~(\ref{eq:friis_measured}) it is possible to estimate the
signal attenuation due to the wireless medium.  Since the
communication reliability is related to the energy per bit, $E_b$,
spent to reach the receiver's antenna, we can determine the power
required by the transmitter for each value of attenuation $G_a$. In
particular, for the ASK-OOK modulation the bit error rate can be
computed as:
\begin{equation}
  BER=Q\bigg( \sqrt{\frac{E_b}{N_0}}\bigg),
  \label{eq:ber}
\end{equation}
where $N_0$ is the transceiver noise spectral density and the $Q$
function is the tail probability of the standard normal distribution
defined by Eqn.~(\ref{eq:q_func}).
\begin{equation} 
  \label{eq:q_func}
  Q(x)=\frac{1}{\sqrt{2\pi}}\int_{x}^{\infty} e^{-\frac{y^2}{2}}dy.
\end{equation}

Since $E_b=P_r/R_b$, where $P_r$ is the power received at the terminal
of the receiver antenna while $R_b$ is the data rate, we can compute
the required transmitting power for a given data rate and BER
requirement and for a given transceiver's thermal noise as:
\begin{equation}
  P_r = E_b \cdot R_b = \left[Q^{-1}(BER)\right]^2 N_0 R_b,
  \label{eq:pr}
\end{equation}
where $Q^{-1}$ is the inverse of the $Q$ function.

Thus, the minimum transmitting power to reach a certain receiver
guaranteeing a maximum BER can be computed as:
\begin{equation}
  P_t(dBm) = P_r(dBm) - G_a(dB),
  \label{eq:pt}
\end{equation}
where $P_r(dBm)$ is given by Eqn.~(\ref{eq:pr}) while $G_a(dB)$ is
computed by using a field solver with the Friis formula when power is
expressed in dBm.\footnote{The absolute power, $P$, can be expressed
  in dBm by $P_{dBm} = 10 \cdot \log{(P \cdot 10^3)}$}

%------------------------------------------------------------------------------

\subsection{Overall Flow}
\label{ssec:pmap_det}
Now we present the basic steps needed for determining the optimal
transmitting power for each node pairs. For the sake of clarity, we
consider the case in which radio hubs are arranged on a mesh
topology. This makes more simple the characterization of the antennas
as symmetries can be exploited.
%% Such transmitting power information are clustered with a
%% certain granularity based on the number of power steps used by the PA
%% and stored into a lookup table in the VGA controller. The basic steps
%% are summarized as follows.
\begin{enumerate}
  \item Computing the attenuation map. For each pair \textless
    transmitting antenna, receiving antenna\textgreater, extract the
    scattering parameters $S_{11}$ and $S_{22}$ and compute the gain
    by means of Eqn.~(\ref{eq:friis_measured}). In this paper we use
    an accurate field solver simulator for estimating the scattering
    parameters $S_{11}$ and $S_{22}$, however, in case of the
    availability of a test-chip, they can be directly measured by
    means of a network analyzer~\cite{o_ted05}.
  
  \item Computing the Power map. For each pair \textless transmitting
    antenna $i$, receiving antenna $j$\textgreater, based on the
    required transmission data rate and the maximum allowed BER, use
    Eqns.~(\ref{eq:pr}-\ref{eq:pt}) for computing the minimum
    transmitting power that met the BER constraint. Let us indicate
    this transmitting power value with $PM(i,j)$.

  %% \item Cluster the optimal transmitting power set. By means of a
  %%   clustering method (\eg, K-Means), cluster the optimal transmitting
  %%   power values computed in the previous step in a number of clusters
  %%   equal to the chosen number of power steps. Let us indicate with
  %%   $PS={ps_1, ps_2, \ldots, ps_n}$ the set of power steps represented
  %%   by the centroids of the clusters.

  \item Determining the power steps. Let $n$ be the number of desired
    power steps and $PM_{min}$, $PM_{max}$ the minimum and maximum
    value of $PM$, respectively. The set of power steps $PS=\{ps_1,
    ps_2, \ldots, ps_n\}$ is defined by dividing the interval
    $[PM_{min}, PM_{max}]$ in $n$ equally spaced levels for which the
    $i$-th power step is:
    \[ ps_i = PM_{min} + (i-1) \times \frac{PM_{max} - PM_{min}}{n-1}. \]

  \item Configuring the VGA controller. Upload the look-up table in
    each VGA controller as follows. Let $LUT_i$ be the look-up table
    of the VGA controller into radio hub $i$. $LUT_i(j)$ encodes the
    power step to be used to transmit to radio hub $j$. Such power
    step is selected as the minimum $ps \in PS$ such that $PM(i,j) \le
    ps$.
\end{enumerate}
In the next section we assess the effectiveness of the proposed
technique in terms of communication energy reduction.

%------------------------------------------------------------------------------

%% \subsection{The Mapping Problem}
%% \label{sec:mapping}
%% Several works in literature have shown the effectiveness of mapping
%% techniques for improving different metrics including performance and
%% energy consumtion~\cite{}. The role played by the mapping becomes even
%% more important in the context of WiNoCs due to the possibility of
%% exploiting a additional degrees of freedom, including, the association
%% between the radio hub and the cluster of concentrated cores, the
%% directionality of the antenna, the number of radio channels to be
%% used, etc. In this paper, we explore one of the aforementioned new
%% mapping dimensions, namely, the association between the radio hub and
%% the cluster of concentrated cores. Specifically, the mapping problem,
%% shown in Fig.~\ref{fig:mapping_description}, is formulated as follows.
%% \begin{figure}
%%   \centering
%%   \includegraphics[width=0.45\textwidth]{pictures/mapping_description.eps}
%%   \caption{The mapping process.}
%%   \label{fig:mapping_description}
%% \end{figure}

%% Let $NG=G(R, RR, L_{R}, L_{RR})$ be the \emph{network graph} where $R$
%% is the set of routers, $RR$ is the set of radio routers, $L_{R}$ is
%% the set of links connecting the routers in $R$, and $L_{RR}$ is the
%% set of links connecting the radio routers in $RR$ with the routers in
%% $R$. We assume that all the links $l_{R} \in L_{R}$ and $l_{RR} \in
%% L_{RR}$ have the same bandwidth capacity $cap$ and the same energy
%% consumption per bit $e_l$. Let $PE$ the set of
%% processing elements. We assume direct networks for each there is a
%% router for each processing element.

%% Let $e_r(rr_s,rr_d)$, with $rr_s,rr_d \in RR$, be the \emph{radio
%%   transmission energy function} which provides the minimum
%% transmission energy per bit for a radio communication from radio
%% router $rr_s$ to radio router $rr_d$.

%% Let $AG=G(T,C)$ be the \emph{application graph} where $T$ is the set
%% of tasks and $C$ is the set of communications among tasks. Let $bnd(c)$
%% and $vol(c)$ be the \emph{communication bandwidth} (in bit/sec) and the
%% \emph{communication volume} (in bit) of communication $c \in C$,
%% respectively.


%% Based on the above definitions, the mapping problem can be formulated
%% as follows. Find a \emph{mapping function}, $map:T \rightarrow PE$,
%% such that the communication energy is minimised and the bandwidth
%% constraints are met. The communication energy, $E$, is the the product
%% between the communication volume and the total energy per bit spent on
%% links and radio transmissions over all the communications. It is
%% computed as follows:
%% \begin{equation}
%%   \begin{aligned}
%%   E =& \sum_{\substack{c=(t_s,t_d) \in C \\ pe_s=map(t_s)
%%         \\ pe_d=map(t_d)}} vol(c) \big[ |LT(pe_s,pe_d)| e_l + \\
%%        & + \sum_{(rr_s,rr_d) \in RRP(pe_s,pe_d)} e_r(rr_s,rr_d) \big],
%%   \end{aligned}
%%   \label{eqn:mapping_energy}
%% \end{equation}
%% where $LT(pe_s,pe_d)$ returns the set of links traversed for the
%% communication between $pe_s$ and $pe_d$, and $RRP(pe_s,pe_d)$ returns
%% the set of radio router pairs (transmitter, receiver) involved in the
%% communication between $pe_s$ and $pe_d$.

%% The bandwidth constraints refer to the fact that the aggregated
%% bandwidth on links cannot exceed their capacity. That is:
%% \[ \sum_{\substack{c=(t_s,t_d) \in C \\ pe_s=map(t_s)
%%     \\ pe_d=map(t_d)}} bnd(c) \times PT(pe_s,pe_d,l) \leq cap \quad
%% \forall l \in L_R, \] where $PT(pe_s,pe_d,l)$ is the pass-through
%% function which returns 1 if $l$ belongs to the routing path for the
%% communication between $pe_s$ and $pe_d$ and 0 otherwise. That is,
%% $PT(pe_s,pe_d,l)=1$ if $l \in LT(pe_s,pe_d)$.

%% Differently from the traditional mapping techniques proposed in
%% literature~\cite{}, here, the mapping selection depends also by the
%% location of the radio routers which is accounted by the radio
%% transmission energy function $e_r$ in
%% Eqn.~(\ref{eqn:mapping_energy}). Such additional degree of freedom
%% results in new opportunities for energy optimization as it will be
%% shown in the experiments section.

%------------------------------------------------------------------------------

\section{Experiments}
\label{sec:experiments}
In this section we present the results of experiments in which a WiNoC
architecture implemented into a $\mathrm{20 \ \textrm{mm} \times 20
  \ \textrm{mm}}$ silicon die is considered. A zigzag antenna has been
accurately modeled and characterized with Ansoft HFSS~\cite{hfss}
(High Frequency Structural Simulator). HFSS is a leading commercial
finite element method (FEM) field solver which simulates 3D structures
and produces S-parameters and radiation patterns. We considered an
high resistivity $\rho=5~\mathrm{K\Omega cm}$ SOI with a substrate
thickness of $350~\mathrm{\mu m}$ and $30~\mathrm{\mu m}$ for the
oxide ($SiO_2$). The antennas are situated at an elevation of
$2~\mathrm{\mu m}$ from the substrate, compatibly with the guidelines
reported in~\cite{seok_iitc05} for reducing the interference with
others metal structures (\cite{seok_iitc05} demonstrates that the
interference due to other metallic structures is negligible by
following such rules).  The zigzag antenna has a thickness of
$2~\mathrm{\mu m}$ and an axial length of $2 \times 340~\mathrm{\mu
  m}$ for operating at around 60~GHz. The same setup has been used
in~\cite{montusclat_ecwt05}.

From HFSS simulation we obtain the scattering parameters ($S_{11}$ and
$S_{12}$) used for computing the Friis formula and then for
calculating the attenuation introduced by the wireless medium. In
particular, $S_{11}$ is also used for determining the antenna
bandwidth as discussed in the following subsection.

%------------------------------------------------------------------------------

\subsection{Bandwidth and Radiation Pattern}
\begin{figure}
  \centering
  \includegraphics[width=0.45\textwidth]{pictures/s11.eps}
  \caption{$S_{11}$ parameter of the zigzag antenna. The bandwidth is
    the range of frequencies below -10~dB.}
  \label{fig:s11}
\end{figure}
Fig.~\ref{fig:s11} shows the $S_{11}$ parameter which quantifies the
portion of transmitting power reflected to the power amplifier due to
impedance mismatch ($50~\mathrm{\Omega}$). Based on a thumb
rule~\cite{balanis2008modern}, it can be assumed that the antenna
impedance matches with the transceiver when, at the operating
frequency, the $S_{11}$ is less than -10~dB.  We used $S_{11}$ for
defining the antenna bandwidth because outside of the range of
frequencies for which $S_{11} < -10~\mathrm{dB}$, the antenna not only
does not work properly as transducer but it could affect the physical
integrity of the final stage of the PA.

Thus, looking at Fig.~\ref{fig:s11}, a bandwidth of about 16~GHz is
enough for providing a data rate upper bound of 8~Gbps with ASK-OOK
modulation. Let us indicate with $B_W$ such bandwidth, the antenna
relative bandwidth is:
\[  B_{r}=\frac{B_W}{f_c}=\frac{16~\mathrm{GHz}}{59~\mathrm{GHz}} = 0.27 \]
where $f_c$ is the resonance frequency. Such information is useful for
determining at which resonance frequency we should design the antenna
for obtaining data rates higher than 8~Gbps, or if we are interested in
having more bandwidth for a frequency division multiplexing (FDM). For
instance, for 4~channels with a data rate of 8~Gbps, we can
design an antenna with a resonance frequency of at least:
\[  f_c=\frac{B_W}{B_r}=\frac{4 \times 16~\mathrm{GHz}}{0.27}=237~\mathrm{GHz} \]
which is obtainable by properly scaling the dimensions of the antenna
(mainly the axial length).

\begin{figure}
  \centering
  \includegraphics[width=0.35\textwidth, angle=270]{pictures/radiation.eps}
  \caption{Radiation pattern for a zigzag antenna at the horizon
    ($\varphi=90^\circ$, continuous line) and at the elevation of maximum
    radiation ($\varphi=35^\circ$, dashed line). $\theta=0^\circ$ is the
    direction parallel to the antenna's main axis while $\theta=90$ is
    the orthogonal direction. According to Fig.~\ref{fig:friis}, we
    assume the antenna situated upon the XY plane.}
  \label{fig:radiation}
\end{figure}
Another important result from simulation is the normalized radiation
pattern shown in Fig.~\ref{fig:radiation}. The radiation pattern is a
polar representation of the directivity represented by the term $D$ in
Eqn.~(\ref{eq:friis_complex}). As it can be observed, the best
performance is obtained when the antenna transmits or receives along
the direction of its main axis. With this information we can have an
idea of the attenuation in a particular direction
Eqn.~(\ref{eq:friis_complex}) as it will be shown in the next
subsections.

%------------------------------------------------------------------------------

\subsection{Attenuation Maps}
\label{ssec:attenuation_map}
\begin{figure*}
  \centering
  \begin{tabular}{cccc}
    \includegraphics[width=0.22\textwidth]{pictures/pmap_c0.eps} &
    \includegraphics[width=0.22\textwidth]{pictures/pmap_c1.eps} &
    \includegraphics[width=0.22\textwidth]{pictures/pmap_c4.eps} &
    \includegraphics[width=0.22\textwidth]{pictures/pmap_c5.eps}
  \end{tabular}
  \caption{HFSS Simulation results: attenuation map ($G_a$) for the
    tiles t0, t1, t4 and t5.  The others map can be obtained
    considering the structure's symmetries.}
  \label{fig:pmap}
\end{figure*}
Let us consider a mesh-based WiNoC formed by a set of $T$ tiles and a
radio hub for each tile. We analyze the attenuation of the signal
transmitted by an antenna in a tile $t \in T$ as perceived by the
other antennas located at tiles $T \setminus \{t\}$. In the
experiments we considered $|T|=16$ in which the distance between two
antennas in the same axis is 2.5~mm.

Fig.~\ref{fig:pmap} shows the attenuation $G_a$ for a transmitting
antenna located on tile $t_0$, $t_1$, $t_4$, and $t_5$. The other
attenuation maps (\ie, the attenuations when the transmitting antenna
is located in other tiles) can be found by symmetry. In fact, the
antenna exhibits very different behavior when it is placed in
different locations within the die~\cite{gutierez_jsac09}. Thus, the
measures should be performed by considering all the possible positions
for the transmitting and receiving antenna. Thanks to the symmetrical
structure of mesh-based topologies, only four measures are needed in
our case. For instance, the attenuation observed by a receiving
antenna at tile $t_{13}$ when the transmitting antenna is on tile
$t_{12}$, $G_a(t_{12},t_{3})$, is the same as observed by the
receiving antenna located on tile $t_1$ when the transmitting antenna
is on tile $t_0$, $G_a(t_{0},t_{1})$. Similarly, we have
$G_a(t_{15},t_{14})=G_a(t_{0},t_{1})$,
$G_a(t_{3},t_{2})=G_a(T_{0},t_{1})$, and so on. In addition,
$G_a(t_x,t_y)=G_a(t_y,t_x)$ for each $t_x, t_y \in T$.

As it can be observed from Fig.~\ref{fig:pmap}, the attenuation
introduced by the wireless medium does not depend only by the relative
distance between the radio hubs but it depends also by their relative
orientation. For instance, $G_a(t_0,t_3)<G_a(t_0,t_4)$ although the
distance between $t_0$ and $t_3$ is three times higher than the
distance between $t_0$ and $t_4$. This can be explained observing the
radiation pattern in Fig.~\ref{fig:radiation} in which the performance
of the antenna increases as it transmits to or receives from its main
axis direction.

In conclusion, the attenuation map is used for computing the maximum
and minimum transmitting power for guaranteeing a certain reliability
level. For the sake of example, let us consider a maximum BER of $3
\times 10^{-14}$ and a data rate of 8~Gbps. From Eqn.~(\ref{eq:pr}),
the power received by the receiving antenna must be -54~dBm. From the
attenuation maps shown in Fig.~\ref{fig:pmap}, the maximum attenuation
is -53~dBm. Thus, the transmitting power (which is maximum as this is
the worst case) is computed by Eqn.~(\ref{eq:pt}) as $P_{t,max} = -54
- (-53) = -1~\mathrm{dBm}$, that in linear scale is
$P_{t,max}=794~\mathrm{\mu W}$. Similarly we can compute the minimum
transmitting power. The minimum attenuation is -33~dBm, thus
$P_{t,min} = -54 - (-33) = -21~\mathrm{dBm}$, that in linear scale is
$P_{t,min}=8~\mathrm{\mu W}$.

%------------------------------------------------------------------------------

\subsection{VGA Controller Analysis}
Let us consider the architecture of the transceiver proposed
in~\cite{daly_jssc07}, also used in~\cite{ditommaso_hoti11}. Such
transceiver provides different transmitting power steps but,
neither~\cite{daly_jssc07} nor~\cite{ditommaso_hoti11} define the
control policy for setting the appropriate power step.  For the
transceiver we estimate a power consumption of 7~mW to 23~mW for the
minimum and maximum transmitting power, respectively. They
corresponding to an energy per bit ranging from 0.42~pJ/bit to
1.4~pJ/bit.

\begin{figure}
  \centering
  \includegraphics[width=0.45\textwidth]{pictures/power.eps}
  \caption{Average power dissipated by the VGA controller for
    different power steps and different packet sizes.}
  \label{fig:vga_power}
\end{figure}
With regard to the logic of VGA controller, it has been synthesized
and evaluated by using Synopsys Design Compiler considering different
number of admissible power steps (3, 7 and 15 power steps).
Considering the gate-level implementation of the controller, the power
analysis has been performed considering various test benches varying
the size of packets.  In fact, as packet size increases, the toggle
rate of the VGA controller decreases as it is active only for the
header flit of the packet. Fig.~\ref{fig:vga_power} shows the average
power dissipation of the VGA controller for different packet size
considering a 28~nm CMOS standard cell library from TSMC operating at
1~GHz. As it can be observed, for a 10-flit packet, the average power
dissipation of the VGA controller is as low as $\mathrm{21 \ \mu W}$
for the 3-step implementation, and about $\mathrm{50 \ \mu W}$ for the
15-step implementation. {\color{red} It is also interesting to notice
  that, above the 4-flit packet size, the power overhead introduced by
  the VGA controller stabilizes to its minimum value. This is due to
  the fact that the VGA controller is activated by the header flit and
  it remains idle for the rest of the flits. Based on this, its duty
  cycle decreases as the packet size increases and, above four flits,
  its energy contribution spreaded over the transmission time becomes
  negligible.}

\begin{figure}
  \centering
  \includegraphics[width=0.35\textwidth]{pictures/area_timing.eps} 
  \caption{VGA controller synthesis results: area and delay overhead.}
  \label{fig:vga_timing_area}
\end{figure}
Fig.~\ref{fig:vga_timing_area} shows the area and timing overhead due
to the VGA controller for different number of power steps. With regard
to the area overhead, it ranges from $\mathrm{50 \ \mu m^2}$ to
$\mathrm{90 \ \mu m^2}$ for the implementations with 3 and 15 power
steps, respectively. Timing results are shown in terms of FO4. In
order to determine the set-up time for configuring the proper
transmitters bias level (power step), the Digital to Analogue
Converter (DAC) into the power control circuitry has been considered
as lumped load for the generated gate-level net-list and used as
constraint during the synthesis phase. {\color{red}As it can be
  observed, a high penalty is observed passing from the 3-step
  configuration to the 7-step configuration both in terms of area
  (12\%) and timing (10\%). Passing to 15-step, such penalties reduce
  to 3\% and 4\% for area and timing, respectively. Based on this, the
  3-step configuration represents the most cost effective
  configuration when a coarse granularity of the power step can be
  tolerated (e.g., when the variance of the attenuation map is
  low). Conversely, in those cases in which a fine tuning of the power
  steps is required, the 15-step configuration should be preferred as
  it introduces a relatively low overhead as compared to the 7-step
  configuration.}

\begin{figure*}
  \centering
  \includegraphics[width=0.80\textwidth]{pictures/pipeline.eps}
  \caption{Pipeline of a conventional radio hub.}
  \label{fig:pipeline}
\end{figure*}


In order to assess how the introduction of the VGA controller impacts
the overall delay metrics of the radio hub, we consider the pipeline
structure of the radio hub shown in Fig.~\ref{fig:pipeline}.
%architecture~\cite{deb_jetcas12} shown in Fig.~\ref{fig:pipeline} 
%and augmented with the proposed VGA controller}.  
%as depicted in Fig.~\ref{fig:pipeline}. 
The radio hub is derived by the baseline
router~\cite{dally_book04,matsutani_tc11} augmented with the proposed
VGA controller. The transceiver is attached to its local port. For
each pipeline stage, Fig.~\ref{fig:pipeline} reports timing
information related to the critical path. As it can be observed, the
VGA controller works in parallel while incoming flits are transfered
to the serializer before the radio transmission. Thus, in terms of
latency, the use of the proposed technique does not affect the
pipeline depth of the radio hub. In terms of clock frequency, the
delay introduced by the VGA controller does not impact the critical
path of the slowest stage (\ie, buffer read and crossbar). For
instance, the 15-step implementation of the VGA controller (\ie, the
slowest one among the three considered in this paper), exhibits a
delay of 8~FO4 which is far below the 16.6~FO4 delay exhibited by the
buffer read and crossbar operations in the same stage.
\begin{figure}
  \centering
  \begin{tabular}{cc}
    \includegraphics[width=0.23\textwidth]{pictures/area_breakdown.eps} &
    \includegraphics[width=0.23\textwidth]{pictures/power_breakdown.eps} \\
    (a) & (b)
  \end{tabular}
  \caption{Area (a) and power (b) breakdown of the radio hub.}
  \label{fig:breakdown}
\end{figure}
Finally, Fig.~\ref{fig:breakdown} shows the area and power breakdown
of the radio hub. As it can be observed the VGA controller accounts
for a negligible fraction of the overall area and power budget which
is less than 0.05\%.

%------------------------------------------------------------------------------

\begin{figure}
  \centering
  \begin{tabular}{cccc}
    \includegraphics[width=0.20\textwidth]{pictures/topology-mesh.eps} &
    \includegraphics[width=0.20\textwidth]{pictures/topology-mc_winoc} &
    \includegraphics[width=0.20\textwidth]{pictures/topology-iwise.eps} &
    \includegraphics[width=0.20\textwidth]{pictures/topology-small_world} \\
    (a) & (b) & (c) & (d)
  \end{tabular}
  \caption{\color{red}Topologies of the considered network
    architectures. (a) Wire-line mesh, (b) McWiNoC, (c) iWise, (d)
    Small-world.}}
  \label{fig:topologies}
\end{figure}
\subsection{Total Energy Saving in Mesh-Topology-Based WiNoCs}
The effectiveness of the proposed technique is affected by the number
of power steps provided by the VGA controller. For quantifying such
impact, we apply the proposed technique on two different mesh
topology-based WiNoC architectures proposed in literature, namely,
iWise~\cite{ditommaso_hoti11} and
McWiNoC~\cite{ditommaso_hoti11}. {\color{red}Specifically, we compare the
following 64-node NoC architectures:}
\begin{enumerate}
  \item Wire-line: A traditional $8 \times 8$ concentrated mesh, with
    clusters formed by 4~cores [Fig.~\ref{topologies}(a)].

  \item McWiNoC: The architecture described in~\cite{zhao_nocs11} for
    a $8 \times 8$ mesh with 4~cores associated with each radio
    hub. This kind of architecture uses TDM multiplexing for the
    wireless medium. The entire bandwidth can be allocated
    for each communications due to the particular structure of the
    architecture [Fig.~\ref{topologies}(b)].

  \item Proposed McWiNoC: Like McWiNoC but augmented with the proposed
    VGA controller.

  \item iWise64: The architecture described in~\cite{ditommaso_hoti11}
    in which the entire bandwidth is divided in four different
    channels [Fig.~\ref{topologies}(c)].

  \item Proposed iWise64: Like iWise64 but augmented with the proposed
    VGA controller.
\end{enumerate}
{\color{red} Power data presented in the previous subsection have been
  used for back-annotating a cycle accurate NoC simulator based on
  Noxim~\cite{noxim} which has been extended for simulating WiNoC
  architectures. In addition, the power models used for links have
  been extended by means of the analytical power equations
  in~\cite{mineo_dsd13} which take into account both the switching
  activity and the coupling switching activity of the link
  bitlines. In all the experiments, if not differently specified,
  wormhole switching is used, the input buffers depth of routers and
  radio hubs are set to 4 and 8 flits, respectively. The routing
  algorithm modelled in the simulator is the same considered
  in~\cite{yu_dt14} which is based on an up/down tree-based routing
  algorithm that uses a multiple-tree-roots-based
  mechanism~\cite{flich_tpds12}.}

\begin{figure*}
  \centering
  \begin{tabular}{cc}
    \includegraphics[width=0.45\textwidth]{pictures/power_steps_iwise.eps} &
    \includegraphics[width=0.45\textwidth]{pictures/power_steps_mcwinoc.eps} \\
    (a) & (b)
    \end{tabular}
  \caption{Energy saving over a traditional wire-line NoC when the
    proposed VGA controller is applied on a iWise~64 architecture (a)
    and on a McWiNoC architecture (b).}
  \label{fig:power_steps}
\end{figure*}
Assuming the wire-line NoC as baseline, Fig.~\ref{fig:power_steps}
shows the overall communication energy saving for different SPLASH-2
benchmarks. {\color{red}The selection of the benchmarks has been
  conducted by clustering them based on the average communication
  distance and standard deviation of the communication
  distances. Based on such figures, the benchmarks are classifide into
  three classes, namely, long-range, mid-range, and short-range, and
  two representative benchmarks for each of these classes are
  considered.} The proposed VGA controller is applied to iWise and
McWiNoC. In particular, we considered four versions of the VGA
controller, namely, 3-, 7-, 15-, and INF-step, which refer to the
considered number of power steps. Please notice that, the INF-step
version is a theoretical case (\ie, it represents an upper-bound in
terms of energy saving) in which the transmission energy is tuned in a
continuous, rather than discrete, fashion. As it can be observed, on
average, iWise and McWiNoC are 22\% and 12\% more energy efficient
than the traditional wire-line NoC. By using the proposed approach,
the average energy saving increases, on average, by 50\% and 46\% for
iWise and McWiNoC, respectively.  As expected, the number of power
steps impacts the energy saving but no relevant improvements are
observed with more than 7 power steps. For this reason, in the rest of
the experiments, we assume a VGA controller with 7 power steps if not
otherwise specified.

%------------------------------------------------------------------------------
\subsection{Total Energy Saving in Small-World Network Based WiNoCs}
In order to explore the impact of the proposed scheme in mm-wave
small-world-based WiNoCs (HmWNoC), we apply the proposed scheme to the
HmWNoC architecture presented in~\cite{deb_tc13}
[Fig.~\ref{topologies}(d)]. Such HmWNoC is a two levels hierarchical
network where the top-level is a mesh topology whereas the lower-level
sub networks are star-ring networks. Since the upper network is a
mesh, the set-up used for obtaining the attenuation maps (\cf,
Sec.~\ref{ssec:attenuation_map}), can be easily reused.

\begin{figure*}
  \centering
  \begin{tabular}{ccc}
    \includegraphics[width=0.30\textwidth]{pictures/mswinoc_256_saving.eps} &
    \includegraphics[width=0.30\textwidth]{pictures/mswinoc_576_saving.eps} &
    \includegraphics[width=0.30\textwidth]{pictures/mswinoc_1024_saving.eps}
  \end{tabular}
   \caption{Percentage of energy saving when the proposed technique is
     applied to HmWNoC architectures with different size (number of
     nodes) and different number of radio hubs.}
  \label{fig:results_mswinoc}
\end{figure*}
Fig.~\ref{fig:results_mswinoc} shows the effectiveness of the proposed
technique when it is applied to a HmWNoC architecture. We analyse
different network configurations in which the number of radio hubs is
made to vary and in which different network sizes are
considered. Specifically, we analyze three different network sizes
with 256, 576, and 1024 nodes (cores) and in which the number of radio
hubs is made to vary from 1 to 16, 6 to 24, and from 8 to 32,
respectively.

In terms of energy saving, as the number of radio hubs increases, the
energy saving increases due to the fact that there are more
opportunities for wireless communications in which the proposed
technique gives its contribution in terms of energy saving. Further,
as the network size increases, the energy saving becomes more
sensitive to the number of radio hubs. Such behaviour can be explained
observing that, for a given number of radio hubs, as the network size
increases, the size of the subnetworks increases as well, and the
fraction of communications which involve the use of the wireless
medium decreases. Thus, since the proposed technique affects only the
wireless communications, its impact on energy figures decreases.

\begin{figure}
  \centering
  \includegraphics[width=0.40\textwidth]{pictures/radio_hub_density.eps}
   \caption{Energy saving vs. radio hub density when the proposed
     technique is applied to HmWNoC architectures with different sizes
   under uniform traffic.}
  \label{fig:radio_hub_density}
\end{figure}
To make this trend more clear, Fig.~\ref{fig:radio_hub_density} plots
on the x-axis the \emph{radio hub density} and on the y-axes the
energy saving for each considered network configuration. With
the term radio hub density we refer to the ratio between the number of
radio hubs and the number of nodes (cores) of the network. As it can
be observed, as the network size increases, the effectiveness of the
proposed technique increases for the same radio hub density. The
energy saving gap between the different network configurations
increases as the radio hub density increases. Specifically, for large
network sizes (1024 nodes), below a radio hub density of 1\%, the
effectiveness of the proposed technique is low (less than 10\% of
energy saving). Above such threshold, the energy saving rapidly
increases. A similar behaviour is observed for the other network sizes
although with different thresholds. It should be pointed out that, the
analysis has been carried out under uniform traffic. The results for
the other traffic scenarios have not been reported for the sake of
brevity and because they brought to the same conclusions.

\begin{figure}
  \centering
  \includegraphics[width=0.45\textwidth]{pictures/energy_vs_load.eps}
   \caption{\color{red}Normalized energy by throughput vs. injection load under
     uniform random traffic (256-node HmWNoC).}
  \label{fig:energy_vs_load}
\end{figure}
{It is \color{red} also interesting to study the energy saving when the injection
load is made to vary. Fig.~\ref{fig:energy_vs_load} shows the
normalized energy by throughput versus injection load for a 256-node
HmWNoC with and without the application of the proposed technique. As
it can be observed, for moderate injection loads, the energy saving
when the proposed technique is used is relevant due to the fact that
the wireless communications account for a significant fraction of the
overall communication energy (\ie, wireless plus wired communication
energy). However, as the injection load increases, the energy
contribution due to the wired communications increases, makes less
evident the effectiveness of the proposed technique which improves
only the wireless communication energy component. }


%% It should be pointed out that, consider networks that
%% involve in a large number of radio hub is not realistic in the mm-wave
%% domain. The experiments above reported  consider  in any case the
%% latter scenario for the sake of simplicity. Results mandatory analyse
%% such cases especially considering  to apply the proposed scheme to
%% future Terahertz~\cite{ganguly_tc10,abadal_ton14} or  Graphene-
%% Based~\cite{abadal_ieeecm13} WiNoC, architectures adapt  to satisfy
%% more aggressive area occupation/performances trade-off  even with
%% larger network.

%------------------------------------------------------------------------------

\subsection{Application Mapping}
The way in which tasks are mapped into the NoC has a tremendous impact
on performance and power metrics~\cite{sahu_jsa13}. In fact, the
possibility of tuning the transmitting power based on the location of
the destination node can be seen as a new degree of freedom in the
mapping problem which results in new opportunities for energy
optimization. In this subsection we assess the improvement in energy
saving when the GAMAP mapping technique~\cite{palesi_jucs12} is used
in conjunction with the proposed technique. We selected a subset of
benchmarks from the SPLASH-2 benchmarks suite that have been simulated
with Graphite Multicore Simulator~\cite{miller_hpca10} for extracting
the communication patterns. Such communication patterns have been then
used for determining the communication graphs which form the input for
the considered mapping technique.

%% Based on the formulation of the mapping problem stated in
%% Sec.~\ref{sec:mapping}, let us now present some experimental
%% results. We used the applications in the SPLASH-2 and PARSEC
%% benchmarks suites. The benchmarks have been executed on Graphite
%% Multicore Simulator~\cite{miller_hpca10} and the application graphs
%% have been extracted. Then, simulated annealing has been used to map
%% the application graph into the nodes of the network with the objective
%% of minimizing the total communication energy consumption as defined in
%% Eqn.~(\ref{eqn:mapping_energy}).
%% Based on the formulation of the mapping problem stated in
%% Sec.~\ref{sec:mapping}, simulated annealing has been used to map
%% the application graph into the nodes of the network with the objective
%% of minimizing the total communication energy consumption as defined in
%% Eqn.~(\ref{eqn:mapping_energy}).

\begin{figure*}
  \centering
  \begin{tabular}{cc}
    \includegraphics[width=0.45\textwidth]{pictures/power_saving_mapping_iwise.eps} &
    \includegraphics[width=0.45\textwidth]{pictures/power_saving_mapping_mcwinoc.eps} \\
    (a) & (b)
    \end{tabular}
  \caption{Impact of the mapping on energy consumption. Energy saving
    over a traditional wire-line NoC when the proposed VGA controller
    is applied on a iWise~64 architecture (a) and on a McWiNoC
    architecture (b).}
  \label{fig:mapping_rnd_vs_sa}
\end{figure*}
Fig.~\ref{fig:mapping_rnd_vs_sa} shows the percentage communication
energy saving (considering the wireline NoC as baseline) when the
mapping is optimized. In particular, for both iWise and McWiNoC we
analyzed three configurations as follows. 1) The proposed technique is
not applied and a random mapping is used, 2) The proposed technique is
applied and a random mapping is used, and 3) The proposed technique is
applied and the application mapping is optimized. The energy
consumption in the case in which the random mapping is used is
measured by averaging the energy consumption over 1,000 random
mappings. As it can be observed, on average, the optimization of the
mapping in conjunction with the proposed technique improves the energy
efficiency by 72\% and 62\% for iWise and McWiNoC, respectively.

%------------------------------------------------------------------------------

\subsection{Case Study}
\begin{figure}
  \centering
  \includegraphics[width=0.45\textwidth]{pictures/case_study64.eps}
  \caption{Heterogeneous system composed by a multimedia sub-system, a
    MIMO-OFDM receiver, a PIP and a MWD module.}
  \label{fig:case_study64}
\end{figure}
Finally, as a case study, we consider a complex heterogeneous system
shown in Fig.~\ref{fig:case_study64}. The system is composed by a
generic MultiMedia System which includes a H.263 video encoder, a
H.263 video decoder, a MP3 audio encoder and a MP3 audio
decoder~\cite{hu_tcad05}, a MIMO-OFDM receiver~\cite{yoon_act06}, a
Picture-In-Picture application (PiP)~\cite{jaspers_tice99} and a
Multi-Window Display application (MWD)~\cite{vandertol_mp02}. We
have mapped the application on both iWise and McWiNoC and assessed the
energy saving when the proposed technique is used.

\begin{figure}
  \centering
  \includegraphics[width=0.40\textwidth]{pictures/results_cg.eps} \\
   \caption{Normalized energy consumption for iWise64 and McWiNoC when
     the proposed technique is applied.}
  \label{fig:results}
\end{figure}


Fig.~\ref{fig:results} shows the normalized energy consumption of the
different architectures as compared to the wireline NoC. As it can be
observed, the application of the proposed technique results in
interesting energy saving up to 50\% and 48\% when applied to iWise64
and McWiNoC, respectively.

%------------------------------------------------------------------------------

\section{Conclusions}
\label{sec:conclusions}
Emerging communication technologies like wireless NoC (WiNoC) are
considered as a viable solution for facing the scalability and the
energy consumption issues in many-core system
architectures. Unfortunately, the transceiver of the radio hub in a
WiNoC accounts for a significant fraction of the overall communication
energy budget. In this paper we have presented a reliability aware
runtime tunable transmitting power technique for improving the energy
efficiency of the transceiver in WiNoC architectures. The proposed
technique is general and can be applied to any WiNoC architecture. In
this paper, it has been applied to three known WiNoC architectures,
namely, iWise64~\cite{ditommaso_hoti11}, McWiNoC~\cite{zhao_nocs11},
and HmWNoC~\cite{deb_tc13}. The experimental results have shown
important energy saving up to 60\% without any impact on performance
metrics. The hardware overhead, in terms of silicon area, introduced
by the proposed technique is negligible as compared to the area of the
transceiver (approx four order of magnitude less than the
transceiver). We believe that the introduction of the proposed
technique opens interesting scenarios in several directions. For
instance, application mapping strategies might take into account the
specific radiation patterns of the antenna or design space exploration
techniques might consider the orientation of the antennas as an
additional degree of freedom for application specific optimization
purposes. {\color{red} Further, since the proposed technique gives its
  best as soon as the variance of distances between the radio hubs
  increases, an interesting design opportunity might be that of
  optimizing the placement of the radio hubs for the sake of energy
  optimization.}

%------------------------------------------------------------------------------
% \balance

\bibliographystyle{IEEEtran} 
\bibliography{bibliography}

%------------------------------------------------------------------------------
\end{document}
